Evaluating kernels on Xeon Phi to accelerate Gysela application

نویسندگان

  • Guillaume Latu
  • Matthieu Haefele
  • Julien Bigot
  • Virginie Grandgirard
  • Thomas Cartier-Michaud
  • Fabien Rozar
چکیده

This work describes the challenges presented by porting parts of the gysela code to the Intel Xeon Phi coprocessor, as well as techniques used for optimization, vectorization and tuning that can be applied to other applications. We evaluate the performance of some generic micro-benchmark on Phi versus Intel Sandy Bridge. Several interpolation kernels useful for the gysela application are analyzed and the performance are shown. Some memory-bound and compute-bound kernels are accelerated by a factor 2 on the Phi device compared to Sandy architecture. Nevertheless, it is hard, if not impossible, to reach a large fraction of the peek performance on the Phi device, especially for real-life applications as gysela. A collateral benefit of this optimization and tuning work is that the execution time of Gysela (using 4D advections) has decreased on a standard architecture such as Intel Sandy Bridge.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization of stencil-based fusion kernels on Tera-flops many-core architectures

We present the optimization of kernels from fusion plasma codes, GYSELA and GT5D, on Tera-flops many-core architectures including accelerators (Xeon Phi, TeslaK20X), and CPUs (FX100). Through the optimization, we found that the structure of array (SoA) style implementation is effective for SIMD operations on all architectures, and high cache locality, which is achieved in GYSELA, is of critical...

متن کامل

Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned

In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Intel Xeon Phi coprocessor. Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization. Since the straightforward porting process of the already existing OpenCL version of the cod...

متن کامل

Wilson and Domainwall Kernels on Oakforest-PACS

We report the performance of Wilson and Domainwall Kernels on a new Intel Xeon Phi Knights Landing based machine named Oakforest-PACS, which is co-hosted by University of Tokyo and Tsukuba University and is currently fastest in Japan. This machine uses Intel Omni-Path for the internode network. We compare performance with several types of implementation including that makes use of the Grid libr...

متن کامل

Performance Evaluation of Breadth-First Search on Intel Xeon Phi

Breadth-First Search (BFS) is one of the most important kernels in graph computing. It is the main kernel of the Graph500 rating that evaluates performance of large supercomputers and multiprocessor nodes in terms of traversed edges per second (TEPS). In this paper we present the results of BFS performance evaluation on a recently released high-performance Intel Xeon Phi coprocessor. We examine...

متن کامل

Profiling of Code_Saturne with HPCToolkit and TAU, and autotuning Kernels with Orio

This study has profiled the application Code Saturne, which is part of the PRACE benchmark suite. The profiling has been carried out with the tools HPCtookit and Tuning and Analysis Utilities (TAU) with the target of finding compute kernels suitable for autotuning. Autotuning is regarded as a necessary step in achieving sustainable performance at an Exascale level as Exascale systems most likel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1503.04645  شماره 

صفحات  -

تاریخ انتشار 2015